The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
作为一种有希望的隐私机器学习方法,联合学习(FL)可以使客户跨客户培训,而不会损害其机密的本地数据。但是,现有的FL方法遇到了不均分布数据的推理性能低的问题,因为它们中的大多数依赖于联合平均(FIDAVG)基于联合的聚合。通过以粗略的方式平均模型参数,FedAvg将局部模型的个体特征黯然失色,这极大地限制了FL的推理能力。更糟糕的是,在每一轮FL培训中,FedAvg向客户端向客户派遣了相同的初始本地模型,这很容易导致对最佳全局模型的局限性搜索。为了解决上述问题,本文提出了一种新颖有效的FL范式,名为FEDMR(联合模型重组)。与传统的基于FedAvg的方法不同,FEDMR的云服务器将收集到的本地型号的每一层层混合,并重组它们以实现新的模型,以供客户端培训。由于在每场FL比赛中进行了细粒度的模型重组和本地培训,FEDMR可以迅速为所有客户找出一个全球最佳模型。全面的实验结果表明,与最先进的FL方法相比,FEDMR可以显着提高推理准确性而不会引起额外的通信开销。
translated by 谷歌翻译
对看不见的环境变化的深入强化学习的概括通常需要对大量各种培训变化进行政策学习。我们从经验上观察到,接受过许多变化的代理商(通才)倾向于在一开始就更快地学习,但是长期以来其最佳水平的性能高原。相比之下,只接受一些变体培训的代理商(专家)通常可以在有限的计算预算下获得高回报。为了两全其美,我们提出了一个新颖的通才特权训练框架。具体来说,我们首先培训一名通才的所有环境变化。当它无法改善时,我们会推出大量的专家,并从通才克隆过重量,每个人都接受了训练,以掌握选定的一小部分变化子集。我们终于通过所有专家的示范引起的辅助奖励恢复了通才的培训。特别是,我们调查了开始专业培训的时机,并在专家的帮助下比较策略以学习通才。我们表明,该框架将政策学习的信封推向了包括Procgen,Meta-World和Maniskill在内的几个具有挑战性和流行的基准。
translated by 谷歌翻译
3D视觉输入的对象操纵对构建可宽大的感知和政策模型构成了许多挑战。然而,现有基准中的3D资产主要缺乏与拓扑和几何中的现实世界内复杂的3D形状的多样性。在这里,我们提出了Sapien操纵技能基准(Manishill)以在全物理模拟器中的各种物体上基准操纵技巧。 Manishill中的3D资产包括大型课堂内拓扑和几何变化。仔细选择任务以涵盖不同类型的操纵挑战。 3D Vision的最新进展也使我们认为我们应该定制基准,以便挑战旨在邀请研究3D深入学习的研究人员。为此,我们模拟了一个移动的全景摄像头,返回以自我为中心的点云或RGB-D图像。此外,我们希望Manishill是为一个对操纵研究感兴趣的广泛研究人员提供服务。除了支持从互动的政策学习,我们还支持学习 - 从演示(LFD)方法,通过提供大量的高质量演示(〜36,000个成功的轨迹,总共〜1.5米点云/ RGB-D帧)。我们提供使用3D深度学习和LFD算法的基线。我们的基准(模拟器,环境,SDK和基线)的所有代码都是开放的,并且将基于基准举办跨学科研究人员面临的挑战。
translated by 谷歌翻译
Domain adaptive detection aims to improve the generalization of detectors on target domain. To reduce discrepancy in feature distributions between two domains, recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning. However, they neglect the relationship between multiple granularities and different features in alignment, degrading detection. Addressing this, we introduce a unified multi-granularity alignment (MGA)-based detection framework for domain-invariant feature learning. The key is to encode the dependencies across different granularities including pixel-, instance-, and category-levels simultaneously to align two domains. Specifically, based on pixel-level features, we first develop an omni-scale gated fusion (OSGF) module to aggregate discriminative representations of instances with scale-aware convolutions, leading to robust multi-scale detection. Besides, we introduce multi-granularity discriminators to identify where, either source or target domains, different granularities of samples come from. Note that, MGA not only leverages instance discriminability in different categories but also exploits category consistency between two domains for detection. Furthermore, we present an adaptive exponential moving average (AEMA) strategy that explores model assessments for model update to improve pseudo labels and alleviate local misalignment problem, boosting detection robustness. Extensive experiments on multiple domain adaption scenarios validate the superiority of MGA over other approaches on FCOS and Faster R-CNN detectors. Code will be released at https://github.com/tiankongzhang/MGA.
translated by 谷歌翻译
Function approximation (FA) has been a critical component in solving large zero-sum games. Yet, little attention has been given towards FA in solving \textit{general-sum} extensive-form games, despite them being widely regarded as being computationally more challenging than their fully competitive or cooperative counterparts. A key challenge is that for many equilibria in general-sum games, no simple analogue to the state value function used in Markov Decision Processes and zero-sum games exists. In this paper, we propose learning the \textit{Enforceable Payoff Frontier} (EPF) -- a generalization of the state value function for general-sum games. We approximate the optimal \textit{Stackelberg extensive-form correlated equilibrium} by representing EPFs with neural networks and training them by using appropriate backup operations and loss functions. This is the first method that applies FA to the Stackelberg setting, allowing us to scale to much larger games while still enjoying performance guarantees based on FA error. Additionally, our proposed method guarantees incentive compatibility and is easy to evaluate without having to depend on self-play or approximate best-response oracles.
translated by 谷歌翻译
Correlated Equilibrium is a solution concept that is more general than Nash Equilibrium (NE) and can lead to outcomes with better social welfare. However, its natural extension to the sequential setting, the \textit{Extensive Form Correlated Equilibrium} (EFCE), requires a quadratic amount of space to solve, even in restricted settings without randomness in nature. To alleviate these concerns, we apply \textit{subgame resolving}, a technique extremely successful in finding NE in zero-sum games to solving general-sum EFCEs. Subgame resolving refines a correlation plan in an \textit{online} manner: instead of solving for the full game upfront, it only solves for strategies in subgames that are reached in actual play, resulting in significant computational gains. In this paper, we (i) lay out the foundations to quantify the quality of a refined strategy, in terms of the \textit{social welfare} and \textit{exploitability} of correlation plans, (ii) show that EFCEs possess a sufficient amount of independence between subgames to perform resolving efficiently, and (iii) provide two algorithms for resolving, one using linear programming and the other based on regret minimization. Both methods guarantee \textit{safety}, i.e., they will never be counterproductive. Our methods are the first time an online method has been applied to the correlated, general-sum setting.
translated by 谷歌翻译
The task of referring video object segmentation aims to segment the object in the frames of a given video to which the referring expressions refer. Previous methods adopt multi-stage approach and design complex pipelines to obtain promising results. Recently, the end-to-end method based on Transformer has proved its superiority. In this work, we draw on the advantages of the above methods to provide a simple and effective pipeline for RVOS. Firstly, We improve the state-of-the-art one-stage method ReferFormer to obtain mask sequences that are strongly correlated with language descriptions. Secondly, based on a reliable and high-quality keyframe, we leverage the superior performance of video object segmentation model to further enhance the quality and temporal consistency of the mask results. Our single model reaches 70.3 J &F on the Referring Youtube-VOS validation set and 63.0 on the test set. After ensemble, we achieve 64.1 on the final leaderboard, ranking 1st place on CVPR2022 Referring Youtube-VOS challenge. Code will be available at https://github.com/Zhiweihhh/cvpr2022-rvos-challenge.git.
translated by 谷歌翻译
Referring image segmentation aims to segment the target object described by a given natural language expression. Typically, referring expressions contain complex relationships between the target and its surrounding objects. The main challenge of this task is to understand the visual and linguistic content simultaneously and to find the referred object accurately among all instances in the image. Currently, the most effective way to solve the above problem is to obtain aligned multi-modal features by computing the correlation between visual and linguistic feature modalities under the supervision of the ground-truth mask. However, existing paradigms have difficulty in thoroughly understanding visual and linguistic content due to the inability to perceive information directly about surrounding objects that refer to the target. This prevents them from learning aligned multi-modal features, which leads to inaccurate segmentation. To address this issue, we present a position-aware contrastive alignment network (PCAN) to enhance the alignment of multi-modal features by guiding the interaction between vision and language through prior position information. Our PCAN consists of two modules: 1) Position Aware Module (PAM), which provides position information of all objects related to natural language descriptions, and 2) Contrastive Language Understanding Module (CLUM), which enhances multi-modal alignment by comparing the features of the referred object with those of related objects. Extensive experiments on three benchmarks demonstrate our PCAN performs favorably against the state-of-the-art methods. Our code will be made publicly available.
translated by 谷歌翻译
Supervised learning methods have been suffering from the fact that a large-scale labeled dataset is mandatory, which is difficult to obtain. This has been a more significant issue for fashion compatibility prediction because compatibility aims to capture people's perception of aesthetics, which are sparse and changing. Thus, the labeled dataset may become outdated quickly due to fast fashion. Moreover, labeling the dataset always needs some expert knowledge; at least they should have a good sense of aesthetics. However, there are limited self/semi-supervised learning techniques in this field. In this paper, we propose a general color distortion prediction task forcing the baseline to recognize low-level image information to learn more discriminative representation for fashion compatibility prediction. Specifically, we first propose to distort the image by adjusting the image color balance, contrast, sharpness, and brightness. Then, we propose adding Gaussian noise to the distorted image before passing them to the convolutional neural network (CNN) backbone to learn a probability distribution over all possible distortions. The proposed pretext task is adopted in the state-of-the-art methods in fashion compatibility and shows its effectiveness in improving these methods' ability in extracting better feature representations. Applying the proposed pretext task to the baseline can consistently outperform the original baseline.
translated by 谷歌翻译